NLlex – a tool to generate lexical analyzers for natural language

نویسنده

  • José João Dias de Almeida
چکیده

In this paper we present a natural language lexical analysis program generator (NLlex) that looks like Unix lex extended with morphological analysis and other Natural Language (NL) elements. NLlex generates a C program which is linked with a morphological analyzer and with other modules, in order to produce a NL processor. As a particular case, NLlex can generate modules to work: as a lexico-morphological analyzer (to be called from yacc, NLyacc, btyacc or any modules that need it) as a simple lexical processor tool NLlex can also deal with ,and be tuned to, the so frequently seen non textual elements (markup elements, LATEX like things, dates, quotes, ...) An interface between NLlex and Prolog have been developed and a Perl interface is under development.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Computational Lexicon of Contemporary Hebrew

Computational lexicons are among the most important resources for natural language processing (NLP). Their importance is even greater in languages with rich morphology, where the lexicon is expected to provide morphological analyzers with enough information to enable them to correctly process intricately inflected forms. We describe the Haifa Lexicon of Contemporary Hebrew, the broadest-coverag...

متن کامل

Design and Implementation of an Intelligent Part of Speech Generator

The aim of this paper is to report on an attempt to design and implement an intelligent system capable of generating the correct part of speech for a given sentence while the sentence is totally new to the system and not stored in any database available to the system. It follows the same steps a normal individual does to provide the correct parts of speech using a natural language processor. It...

متن کامل

General Incremental Lexical Analysis

We present the first fully general approach to the problem of incremental lexical analysis. Our approach utilizes existing generators of (batch) lexical analyzers to derive the information needed by an incremental run-time system. No changes to the generator’s algorithms or run-time mechanism are required. The entire pattern language of the original tool is supported, including such features as...

متن کامل

Comparing Lexical Bundles in Hard Science Lectures; A Case of Native and Non-Native University Lecturers

Researchers stated that learning and applying certain set of lexical bundles of native lecturers by non-native lecturers would help students improve their proficiency through incidental vocabulary input. The present study shed light on the lexical bundles in hard science lectures used by Native and Non-native lecturers in international universities with the main purpose of analyzing the structu...

متن کامل

Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations

The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007